Variational Inference for Policy Gradient
نویسنده
چکیده
Inspired by the seminal work on Stein Variational Inference [2] and Stein Variational Policy Gradient [3], we derived a method to generate samples from the posterior variational parameter distribution by explicitly minimizing the KL divergence to match the target distribution in an amortize fashion. Consequently, we applied this varational inference technique into vanilla policy gradient, TRPO and PPO with Bayesian Neural Network parameterizations for reinforcement learning problems. 1 Parametric Minimization of KL Divergence Suppose we have a random sample from a base distribution ξ ∼ q0(ξ), e.g. q0 = N (0, I), we are able to generate an induced distribution qφ(θ) by the general invertible and differentiable transformation θ = hφ(ξ) (see Appendix A). Our goal is to regard qφ(θ) as a variational distribution to match the true distribution p(θ) such that J = KL(qφ(θ)||p(θ)) is minimized. Lemma 1. H(q) = H(q0) + Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )) (1) with (1), we can have the following identity for KL(qφ(θ)||p(θ)) : KL(q||p) = −H(q)− Eq(θ)(log p(θ)) = −H(q0)− Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )) − Eξ∼q0(log p(hφ(ξ))) = −H(q0)− Eξ∼q0 ( log det ( ∂hφ(ξ) ∂ξ )
منابع مشابه
Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning
In this paper we discuss very preliminary work on how we can reduce the variance in black box variational inference based on a framework that combines Monte Carlo with exhaustive search. We also discuss how Monte Carlo and exhaustive search can be combined to deal with infinite dimensional discrete spaces. Our method builds upon and extends a recently proposed algorithm that constructs stochast...
متن کاملStein Variational Policy Gradient
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian in...
متن کاملStochastic Variational Inference with Gradient Linearization
Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions for closed-form variable updates. Instead, one simply needs to derive the gradient of the log-posterior, whic...
متن کاملNatural Gradients via the Variational Predictive Distribution
Variational inference transforms posterior inference into parametric optimization thereby enabling the use of latent variable models where it would otherwise be impractical. However, variational inference can be finicky when different variational parameters control variables that are strongly correlated under the model. Traditional natural gradients that use the variational approximation fail t...
متن کاملTwo Methods for Wild Variational Inference
Variational inference provides a powerful tool for approximate probabilistic inference on complex, structured models. Typical variational inference methods, however, require to use inference networks with computationally tractable probability density functions. This largely limits the design and implementation of variational inference methods. We consider wild variational inference methods that...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1802.07833 شماره
صفحات -
تاریخ انتشار 2018